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Abstract 

BATched Sparse codes (BATS codes) are proposed for transmitting a collection of packets through a commu- 
nication network with packet loss. A BATS code consists of an inner code and an outer code over a finite field. 
The outer code is a matrix generalization of a fountain code that preserves desirable properties of the latter such as 
ratelessness and low encoding/decoding complexity. The outer code encodes the file to be transmitted into batches, 
each of which containing M packets. When the batch size M is equal to 1, the outer code reduces to a fountain code. 
The inner code is comprised of the linear network coding performed by the intermediate network nodes. With the 
constraint that linear network coding is applied only to packets within the same batch, the structure of the outer code 
is preserved. Furthermore, the computational capability of the intermediate network nodes required to apply BATS 
codes is independent of the number of packets for transmission. For tree networks, the size of the buffer required 
at the intermediate nodes is also independent of the number of packets for transmission. It is verified theoretically 
for certain cases and demonstrated numerically for some general cases that BATS codes asymptotically achieve rates 
very close to the capacity of the underlying networks. 

Index Terms 

Network coding, fountain codes, sparse graph codes, belief propagation. 

I. Introduction 

One fundamental task of communication networks is to distribute a bulk of digital data, called a file, from a 
source node to a set of destination nodes. We consider this file distribution problem, called multicast, in packet 
networks, in which data packets transmitted on the network links can be lost due to channel noise, congestion, 
faulty network hardware, and so on. 

Existing network protocols, for example TCP, mostly use retransmission to guarantee reliable transmission of 
individual packets. Retransmission relies on feedback and is not scalable for multicast transmission. On the other 
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hand, fountain codes, including LT codes (TJ, Raptor codes (2) and online codes [3|, provide a good solution for 
routing networks without relying on feedback, where the intermediate nodes apply store-and-forward. When using 
fountain codes, the source node keeps transmitting coded packets generated by a fountain code encoder and a 
destination node can decode the original file after receiving n coded packets, where n typically is only slightly 
larger than the number of the input packets, regardless of which n packets are received. Fountain codes have the 
advantages of ratelessness, universality, and low encoding/decoding complexity. Taking Raptor codes as an example, 
both the encoding and decoding of a packet has constant complexity. 

Routing, however, is not an optimal operation at the intermediate nodes in the presence of packet loss from 
the throughput point of view. For example, the routing capacity of the network in Fig. Q]is 0.64 packet per use. 
If we allow decoding and encoding operations at the intermediate node and treat the network as a concatenation 
of two erasure channels, we can achieve the rate 0.8 packet per use by using erasure codes on both links. For 
a general network, the maximum multicast rate can be achieved only by network coding J4). Network coding 
allows an intermediate node to generate and transmit new packets using the packets it has received. Linear network 
coding [5| was proved to be sufficient for multicast communications and can be realized distributedly by random 
linear network coding ||6l-[|9l. 

The following network coding method has been proved to achieve the multicast capacity for networks with packet 
loss in a wide range of scenarios ifTOl . ifTTl . The source node transmits random linear combinations of the input 
packets and an intermediate node transmits random linear combinations of the packets it has received. Note that 
no erasure codes are required for each link though packet loss is allowed. Network coding itself plays the role of 
end-to-end erasure codes. A destination node can decode the input packets when it receives enough coded packets 
with linearly independent coding vectors. 

The above scheme, referred to as the baseline random linear network coding scheme, has been implemented in 
wireline peer-to-peer (P2P) networks ifTZl . ifPTl (see lfl4l for a network coding analysis), in which every node in the 
network is required to decode the file. However, the computational and storage complexities of this scheme are not 
suitable for many other practical applications, in particular wireless applications. Consider transmitting K packets 
where each packet consists of T symbols in a finite field. The computational complexity of encoding in the source 
node is 0(TK) per packet. An intermediate node needs to buffer all the packets it has received for network coding, 
so in the worst case, the storage cost is K packets, and the computational complexity of encoding is 0(TK) per 
packet. Decoding using Gaussian elimination has complexity 0(K 2 +TK) per packet. Though these complexities 
are polynomials in K, the baseline random linear network coding scheme is still difficult to implement for large 
K. In particular, the intermediate nodes, like network routers, usually have limited buffer capability. Since the size 
of the required buffer at the intermediate node depends on the file size, such an implementation cannot handle an 
arbitrarily large file. 

In practice, we hope to build network coding enabled devices with limited storage and computational capabilities. 
Accordingly, it is desirable for a network coding scheme to have i) low encoding complexity in the source node 
and low decoding complexity in the destination nodes, ii) constant computational complexity of encoding a packet 
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Fig. 1. In this network, s is the source node, t is the destination node, and a is the intermediate node that does not demand the file. Both 
links are capable of transmitting one packet per use and have a packet loss rate 0.2. 

in an intermediate node and constant buffer requirement in an intermediate node, iii) small protocol overhead, and 
iv) high transmission rate. 

A. Some Previous Works 

There are roughly two classes of works for designing efficient file transmission schemes in networks with coding 
at the intermediate nodes, but they either cannot meet our requirement at the intermediate nodes or have other 
drawbacks. 

The first class of works tries to extend fountain codes to networks with coding at the intermediate nodes. Since 
coding in the intermediate nodes changes the degrees of the packets, it is difficult to guarantee that the degrees 
of the received packets follow a specific distribution. Solutions have been proposed for special network topologies 
(e.g., line networks [15], [16]) and special communication scenarios (e.g., peer-to-peer file sharing JT7], lfl"8l ). but 
those solutions are difficult to be extended to general network settings and cannot meet our requirement for the 
intermediate nodes. For example, all the schemes proposed in |[T5l - lfT8l require the intermediate nodes to have a 
buffer size that increases linearly with the number of packets for transmission. 

The second class of works try to simplify the complexity of linear network coding using chunks Q. A chunk 
(also called generation or class) is a subset of the packets for transmission. Encoding, recoding and decoding are 
all performed within one chunk. It reduces the encoding and decoding complexity to 0(TL) and 0(1? +TL) per 
packet, respectively, where chunks are disjoint and have size L, but at the same time introduces the scheduling 
issues of chunks. Specifically, sequential scheduling of chunks requires feedback and is not scalable for multicast, 
while random scheduling of chunks requires the intermediate nodes to cache all the chunks [19|-[22j. For a detailed 
discussion on the scheduling issues, we refer the reader to 11231 . 

B. Our Solution 

To address the issues of the existing schemes, we propose a solution called BATched Sparse codes (BATS codes), 
which extends fountain codes to the realm of networks and at the same time incorporates random linear network 
coding. A BATS code consists of an inner code and an outer code over a finite field. The outer code is a matrix 
generalization of a fountain code, and hence rateless. The outer code encodes the file to be transmitted into batches, 
each containing M packets. When the batch size M is equal to 1, the outer code reduces to a fountain code. 
The inner code is comprised of the linear network coding performed by the intermediate network nodes. The only 
constraint on the linear network coding scheme (other than causality) is that only packets belonging to the same 
batch can be combined. Since the inner code does not change the structure of the outer code, an efficient belief 
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propagation (BP) decoding algorithm can be used to decode BATS codes. When the batch size M is equal to K, 
a BATS code can become the baseline random linear network coding scheme (see a discussion in Section Ill-El l. 

When applying BATS codes, the encoding complexity of the outer code is 0(TM) per packet and the corre- 
sponding decoding complexity is 0(M 2 + TM) per packet. An intermediate node uses 0(TM) time to recode a 
packet, and an intermediate node is required to buffer only O(M) packets for tree networks, including the three- 
node network in Fig. Q] Note that all these requirements for BATS codes do not depend on K, the total number of 
packets for transmission. 

BATS codes are suitable for any network that allows linear network coding at the intermediate nodes. These 
codes are robust against dynamical network topology and packet loss since the end-to-end operation remains linear. 
Moreover, BATS codes can operate with small finite fields. In contrast, most existing random linear network coding 
schemes require a large field size to guarantee a full rank for the transfer matrix. For BATS codes, as we will see, 
the transfer matrices of the batches are allowed to have arbitrary rank deficiency. 

Even though the underlying network can vary, the performance of a BATS code can be evaluated independent of 
the details of the intermediate operations and network topologies given the ranks of the transfer matrices applied on 
the batches. We use density evolution to analyze the BP decoding process of BATS codes, and obtain a sufficient 
and a necessary condition for the BP decoding succeeding with high probability. 

A near optimal degree distribution for a BATS code can be obtained by solving an optimization problem induced 
by the sufficient condition. The optimization problem can be approximately solved by linear programming. When 
the empirical distribution of the transfer matrix ranks converges to a probability vector (ho, hi, . . . , Hm), we verify 
theoretically for certain cases and demonstrate numerically for some general cases that BATS codes can achieve 
rates very close to J^i ihu the maximum achievable rate in term of packet per use for such transfer matrices. 

In the rest of this paper, BATS codes are formally introduced in Section HI] The belief propagation decoding of 
BATS codes is analyzed in Section [TTTJ An optimization of the degree distribution is discussed in Section [IV] An 
example of how to use BATS codes in the three-node network is illustrated in Section [V] 

II. BATS Codes 

In this section, we discuss the encoding and decoding of BATS codes. Consider encoding K input packets, each 
of which has T symbols in a finite field F with size q. A packet is denoted by a column vector in F T . The rank 
of a matrix is denoted by rk(A). In the following discussion, we equate a set of packets to a matrix formed by 
juxtaposing the packets in this set. For example, we denote the set of the input packets by the matrix 



B = 



h,b 2 



where hi is the ith input packet. When treating the packets as a set, with an abuse of notation, we also write bi G B, 
B' C B, etc. 



Gi G2 G3 G4 G5 



Fig. 2. Tanner graph for encoding and transmitting of the first five batches. Nodes in the first row are the variable nodes representing the input 
packets. Nodes in the second row are the check nodes representing the batches. 

A. Encoding of Batches 

A batch is a set of M coded packets generated from a subset of the K input packets. For i = 1, 2, . . ., the ith 
batch Xj is generated from a subset B; C B of the input packets by the operation 

X,; = BjG^, 

where Gi, a matrix with M columns, is called the generator matrix of the ith batch. We call the packets in B; the 
contributors of the ith batch. The formation of Bj is specified by a degree distribution ^ — (^ , ^> 1} ■ • • , ^k)' 1) 
sample the distribution "J which returns a degree di with probability ; 2) uniformly at random choose di input 
packets to form B;. The design of ^ is discussed later in Section IPVl 

The dimension of G^ is di x M. In this paper, we analyze BATS codes with random generator matrices. 
Specifically, all the components of G; are independently and uniformly chosen at random by the encoder. Such a 
random matrix is also called a totally random matrix. Random generator matrices do not only facilitate analysis 
but are also readily implementable. For example, Gi, i = 1, 2, • • ■ can be generated by a pseudorandom number 
generator and can be recovered at the destinations by the same pseudorandom number generator. 

The code described above, called the outer code of the BATS code, can be described by a Tanner graph. A 
Tanner graph has K variable nodes, where variable node i corresponds to the ith input packet hi, and n check 
nodes, where check node j corresponds to the jth batch Xj. Check node j is connected to variable node i if hi is 
a contributor of Xj. Fig. |2] illustrates an example of a Tanner graph for encoding. 

B. Transmission of Batches 

To transmit a batch, the source node transmits the packets in the batch, not necessarily in the order they are 
generated. No feedback is required to stop the transmission of each batch. A BATS code can be used as a rateless 
code, i.e., the number of batches transmitted is not fixed and is potentially unlimited. An intermediate node encodes 
the received packets within the same batch into new packets by taking random linear combinations and transmits 
these new packets on the outgoing links, i.e., random linear network coding is applied to packets belonging to the 
same batch. These new packets so generated are regarded as belonging to the same batch. The rule is that packets 
belonging to different batches are not mixed inside the network. BATS codes are robust against dynamical network 
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topology and packet loss since the end-to-end operation remains linear. The random linear network coding applied 
on batches is referred to as the inner code of the BATS code. 

To apply BATS codes, we further need to consider how to schedule the transmission of batches at the source 
node and at the intermediate nodes, and how to manage the buffers at the intermediate nodes. The design of these 
network operations varies for different scenarios. For the file distribution in a P2P network, since all the network 
nodes request the file, random scheduling of the batches can reduce the protocol overhead. In contrast, since the 
intermediate node in the three-node network in Fig. [T]does not require the file, sequential scheduling of the batches 
at both the source node and the intermediate node can minimize the buffer requirement at the intermediate node. 
As we will show in Section [V] caching one batch in the intermediate node is asymptotically optimal. The point is 
that the intermediate node always receives the packets of the same batch consecutively. Since only the packets of 
the same batch can be combined by network coding, it is not necessary to keep the batches whose transmission has 
been completed by the intermediate node. Note that the completion of the transmission of a batch at the intermediate 
node is signaled by the reception of the first packet of the next batch. Similarly, for tree networks with the root being 
the source node, sequential scheduling of the batches can also minimize the buffer requirement at the intermediate 
nodes. 

Given the end-to-end transformations applied to the batches, the design of the outer code does not depend on 
the details of the network operations. So we will not discuss the detailed network operations on the batches in 
general networks. Nevertheless, we demonstrate how a BATS code works in the three-node network in Section [V] 
where some general guidelines on the design of the intermediate operation are given. Note that though the three- 
node network is simple, it models many situations that arise in multiple hop transmissions in wireline and wireless 
communications. 

Let Yj be the received packets at a destination node that belong to the zth batch. We write 

Y,; = XjH; = B s ;G;Hi, (1) 

where EL is the transfer matrix incurred by the linear network coding operation of the network 0, ll24l for the 
ith batch. The number of rows of Hj is M, while the number of columns varies for different batches and is finite. 
We assume that Hj is known by the destination node through the coding vectors in the packet headers. When the 
packet length T is sufficiently large, this overhead is negligible. See an introduction of linear network coding in 
||251 for more details. 

The operation of the network on the batches in ([T} can be modeled as a linear operator channel (LOC), which 
has been studied for linear network coding [26 1— [28 1. The outer code of a BATS code can be regarded as a channel 
code for the LOC. In the analysis of BATS codes, we assume that the empirical rank distribution of the transfer 
matrices converges in probability to a probability vector. This is a mild assumption since it does not require the 
ranks of the transfer matrices to be i.i.d. as in lETI . 11281 . See Appendix U for more discussion and a characterization 
of the capacity of such LOCs. 
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Fig. 3. A decoding graph. Nodes in the first row are the variable nodes representing the input packets. Nodes in the second row are the check 
nodes representing the batches. 

C. Belief Propagation Decoding 

A destination tries to decode the input packets using Y, and the knowledge of G^Hi for i — 1,2, ... ,n. The 
decoding process is better described using the bipartite graph in Fig. [3] which is the same as the encoding graph 
in Fig. [2] except that associated with each check node i is the matrix G^Hi. 

A check node i is called decodable if G, Hi has rank di, the degree of the ith batch. If so, then is recovered 
by solving the linear system of equations Yj = BiG;Hi, which has a unique solution since rk(G;Hi) = di. After 
decoding the ith batch, we recover the di input packets in B,. Then substitute the values of these input packets in 
the undecoded batches. Consider that bk is in B^. If variable node k has only one edge that connects with check 
node i, just remove variable node k. If variable node k also connects check node j =t i, then besides removing 
the variable node, also remove the row in GjHj corresponding to variable node k. In the decoding graph, this is 
equivalent to first removing check node i and its neighboring variable nodes, and then for each removed variable 
node update its neighboring check nodes. We repeat this decoding-substitution procedure on the new graph until 
no more check nodes are decodable. 

The degree distribution is the crucial parameter that affects the performance of the BP decoding. We want to 
design a degree distribution such that i) the BP decoding succeeds with high probability, ii) the encoding/decoding 
complexity is low, and iii) the coding rate is high. Based on the analysis of the decoding process in Section [III] an 
optimization of the degree distribution will be provided in Section IIVI 

D. Precoding of BATS Codes 

The same technique of Raptor codes is applied here to reduce the encoding/decoding complexity of BATS codes. 
The input packets are first encoded using a traditional erasure code (precode), and then encoded by a BATS code. 
We require that the belief propagation decoding of the BATS code recovers a given fraction of its input packets. The 
traditional erasure code is capable of recovering the original input packets in face of a fixed fraction of erasures. 
Fig. 2] demonstrates a systematic precode together with a BATS code. 

E. Computation Complexity 

The complexity of encoding a batch with degree d is 0(TMd). For a encoding graph with n check nodes, i.e., n 
batches, the encoding complexity is 0(TM^" =1 di), which converges to 0(TMnE[f]) when n is large, where 



Fig. 4. Precoding of BATS codes. Nodes in the first row represent the input packets. Nodes in the second row represent the intermediate 
packets generated by the precode. Nodes in the third row represent the batches generated by the encoding of a BATS code. 

Let ki = rk(Hj) and let k[ be the rank of G,Hj when check node i is decodable. It is clear that k[ < fc, < M. 
The decoding processing involves two parts: the first part is the decoding of the decodable check nodes, which 
has complexity OQDj + ^Si^i 2 )' me second part is updating the decoding graph, which has complexity 
C(T Y,i(di - K)M). So the total complexity is 0(£i k f k ? + Tj2 l ( d * ~ K) M )- which can be simplified 

to 0(nM 3 + TMJ2 t di). When n is large, the complexity converges to 0(M 3 n + TMnE[$]). Usually, T and 
E[\&] is larger than M and the second term is dominant. 

We will see from Section [TV] that we can find a degree distribution with E^] = 0(M). In the design of BATS 
codes, M is a parameter independent of K. The rate of the code is packets per transmission. When the rate 
of the code converges to a constant value, we see that the encoding and decoding complexity are 0(TKM) and 
0(KM 2 + TKM), respectively. 

The batch size M determines the tradeoff between the complexity and the maximum achievable rate. When 
M = 1, a BATS code degenerates to a Raptor code, which has the lowest computation complexity but cannot get 
the benefit of network coding. When M = K and the degrees of all batches are K, a BATS code becomes the 
baseline random linear network coding scheme. In the second case, though the complexity is high, the potential of 
network coding can be fully realized. 

III. Decoding Analysis 

Some existing methods for analyzing the BP decoding of erasure codes can be modified to analyze the BP 
decoding of BATS codes. In this paper, we adopt the differential equation approach ll29l that has been used in ll30l 
(see also OJJ). 

Compared with the analysis of fountain codes, BATS codes have a relatively complex decoding criteria that 
involves both the degree and the rank value of a check node. In addition to the evolution of the degrees of the 
check nodes, the evolution of the ranks of the check nodes also needs to be tracked in the decoding analysis. 
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A. Random Decoding Graph 

Consider a random decoding graph with K variable nodes and n check nodes. Fix a degree distribution \P = 
(^o, 'J'i, • ■ • ,^d), where D is the maximum integer such that ^rj is nonzero. Assume that D = 0(M). The 
feasibility of this assumption will be justified later. The degree di of check node i is obtained by sampling the 
degree distribution The di neighbors of check node i is uniformly chosen and the generator matrix d of check 
node i is a di x M totally random matrix, i.e., its components are uniformly i.i.d. 

Let Hi be the transfer matrix associated with check node i. Assume that the empirical distribution of the transfer 

matrix ranks converges in probability to a probability vector h — (h , . . . , h^)- Specifically, for k = 0, . . . , M let 

a \{i:rk(H i ) = k}\ 
n 

Note that 7Tfc depends on n. We assume that the convergence of the matrix ranks satisfies 

\n k -h k \ =0(n- 1 / 6 ), 0<k<M, (2) 

with probability at least 1 — 7(71), where 7(ra) = o(l), i.e., there exists a constant c such that for all sufficiently 
large n, 

Pr{|7r fe - h k \ < cn" 1/6 , < fc < M} < 1 - 7(71), 

and 

lim 7(n) = 0. 

n— >oo 

Note that the above assumption on the convergence of {nk} is valid when {Hi} are i.i.d. and rk(Hi) follows the 
distribution h. We also assume that the transfer matrices are independent of the generation of batches. The random 
decoding graph of a BATS code described above is denoted by BATS^ , n, "J, h). 

We call ri = rk(GiHi) the rank of check node i. Define the following two regions of the degree-rank pair: 

f = {(d, r):l<r<M,r<d<D}, 
T = {(d, r):l<r<M,r<d<D}. 

We see that T = T U {(r, r), r = 1, . . . , M}. A check node with rank zero does not help the decoding, so we do 
not include (d, 0) in T and T. To analyze the decoding process, we use the degree-rank distribution of the edges 
defined as follows. An edge is said to be of degree d and rank r if it is connected to a check node with degree d 
and rank r. Let i?d,r be the number of edges of degree d and rank r. Define the degree-rank distribution of the 
edges as 

R±(R dtr ,(d,r) eF). 

Note that Rd^ r /d gives the number of nodes with degree d and rank r. 

For a check node with degree d and transfer matrix rank k, the probability that it has rank r is denoted by ^' k . 
The details can be found in Appendix IH-AI but for the purpose of the discussion here, an explicit form of 0> k is 
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not needed. Let 



(3) 



k—r 



be the probability that a check node with degree d has rank r when the rank of the transfer matrix is chosen 
according to the probability vector ft. Let 

Pd,r - d^dh^r, (4) 

where npd,r is the expected number of edges of degree d and rank r in the decoding graph when the rank of 
a transfer matrix is chosen according to the probability vector ft independently. The following lemma shows that 
Rd,r/n converges in probability to pd,r as n g° es to infinity. 

Lemma 1: With probability at least 1 - (7(71) + 2MDexp(-2n 2/3 )), 

Rd,; 



Pd,: 



0{n 



-1/61 



(d, r) e T. 



Proof: Consider the instances of {irk} satisfying (f2]). By the assumption on {iik}, this will decrease the bound 
by at most j(n). With an abuse of notation, we treat {irk} as an instance satisfying (f2| in the following of this 
proof. 

The decoding graph has nirk check nodes with transfer matrix rank k. For a check node with degree d and 
transfer matrix rank k, the probability that it has rank r is ,fc when r < min{d, fc}, and is zero otherwise. Thus 
the expected number of check nodes with degree d and rank r is 



M 



k—r k—r 

By Hoeffding's inequality, with probability at least 1 — 2M D exp(— 2n 2 ' 3 ), 

M 



Rd,r 



Then, 



Rd.-i 



dn 



dn 



- ^dh d , r 



< n 



-1/6 



(d, r) e 



(5) 



A/ 



A/ 



< 



dn 

Rd.r 



k—r 
M 



dn 



k—r 
M 



where the last inequality follows from the triangle inequality and the definition of hd. r in ©. By ©, under the 
condition of ©, we have 



Rd.r 



dn 



did-, 



0(n 



with probability at least 1 — 2MD exp(— 2n 2 / 3 ). The proof is completed by considering {nf.} not satisfying ©. ■ 
We will analyze the average decoding performance of BATS (if, n, ft.) with a random decoding strategy. In 
each decoding step, an edge (U, V) with degree equal to the rank is uniformly chosen, where U is a check node 
and V is a variable node. Since check node U has degree equal to the rank, variable node V is decodable. Variable 
node V, as well as all the edges connected to it, are removed in the decoding graph. For each check node connected 
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d=l 2 3 4 5 6 7 8 




Fig. 5. State transition diagram for M = 5 and D = 8. Each node in the graph represent a degree-rank pair. In each step, if the check node 
connects to the decoded variable node, its state changes according to the direction of the outgoing edges of its current state. The label on an 
edge shows the probability that a direction is chosen. 



to variable node V, three operations are applied: 1) the degree is reduced by 1; 2) the row in the generator matrix 
corresponding to the variable node V is removed; and 3) the rank is updated accordingly. The decoding process 
stops when there is no edge with degree equal to the rank. The following decoding analysis is based on this random 
decoding strategy. In the decoding process described in the last section, decoding a check node with degree equal 
to the rank can recover several variable nodes simultaneously. Note that for a given instance of the decoding graph, 
both strategies will reduce the decoding graph to the same residual graph when they stop (see the discussion in 
Appendix ITTTb - 

B. Density Evolution 

Consider the evolution of BATS(i\T, n, ^, h) during the decoding process. Time t starts at zero and increases by 
one for each variable node removed by the decoder. Let Rd,r(t) denote the number of edges in the residual graph 
of degree d and rank r at time t > with i?<j, r (0) = Rd, r - 

Upon removing a neighboring variable node of a check node with degree d and rank r, the degree of the check 
node will change to d — 1. The rank of the check node may remain unchanged with probability 

1 - q~ d+r 

ad, r = —3-, {d,r)eF (6) 

1 — q a 

(see the derivation in Appendix Ill-Al l, or may change to r — 1 with probability ct c i, r = 1 — ocd,r- Regarding a 
degree-rank pair as a state, the state transition of a check node during the decoding process is illustrated in Fig. 
Assume that the process has not stopped. At time t, we have K — t variable nodes left in the residual graph, and 
an edge with degree equal to the rank is uniformly chosen to be removed. Let 

R{t) 4 (R dr (t) : (d,r) S.F). 

As we will show in the following lemma, the random process {R(t)} is a Markov chain. This suggests a straight- 
forward approach to compute all the transition probabilities in the Markov chain, but as discussed in [29|, this 
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approach may lead to a complicated formula. Instead of taking this approach, we work out the expected change 
Rd,r(t + 1) - Rd,r(t) explicitly for all t > 0. Let 



M 



R (t)=J2 R rAt)- 



r=l 

We do not need to study the behavior of R r , r {t) for the individual value of r since Ro(t) is sufficient to determine 
when the decoding process stops. Specifically, the decoding process stops as soon as Ro(t) becomes zero. 

Lemma 2: The random process {R(t)} is a Markov chain, and for any constant c 6 (0, 1), as long as t < cK 
and Ro(t) > 0, we have 

E[R d . r (t + 1) - R d .r(t)\R(t)} = («d+i,r^+i.r(i) + a d+1 . r+1 R d+1 , r+1 (t) - R d (*))___-, (d, r) e T, (7) 

K — t 

and 

E[i? (t + 1) - R (t)\R(t)} = E r ^r+l,r-Rr + l,rW _ ^ _ j + Q( ^ (g) 

K — t K — t 

Proof: Fix a time t > 0. With an abuse of notation, we treat -R(O), . . . , _R(t) as instances in the proof, i.e., the 
values of these random vectors are fixed. Let (U, V) be the edge chosen to be removed at time t, where V is the 
variable node and U is the check node, according to the random decoding algorithm described in Section IIII-AI 
Note that V is uniformly distributed among all variable nodes and U must be a check node with degree equal to 
the rank at time t. 

Define indicator random variables i dt r(i), i = 1, ■ • ■ , Rd 'j , where i d ,r(i) = 1 if the ith check node with 
degree d and rank r becomes degree d — 1 and rank r at time t + 1. Define indicator random variables Hd,r(i), 
i = 1, . . . , StelH t where fi d ,r(i) — 1 if th e ith check node with degree d and rank r becomes degree d — 1 and 
rank r — 1 at time t + 1. The difference R d , r {t + 1) — R d>r (t) can then be expressed as 

-Rd+i.r(*)/(d+i) H d+ i, r+ i(i)/(d+l) R d . r (t)/d 

= ^ d-L d+1 ^ r (i)+ ^ d ■ fX d +X,r+l(i) - X! d ( L d,r( i ) +A t d,r(*))- (9) 

i=l i=l i=l 

Let us look at the joint distribution of L d<r (i), n d , r (i), (d, r) G T, 1 < i < Rd ^ t ' , Let A r (i) be the event that U 
is the ith check node with degree r and rank r. Since (£/, V") is uniformly distributed among all edges with degree 
equal to rank, we have that A r (i) for all r and i are mutually exclusive and 

PT{A r (i)} = 



Ro(t) 

Define indicator random variable j3 d>r (i) with fid,r{i) — 1 if V is a neighbor of the ith check node with degree 
d and rank r. Conditioning on A r >(i'), by the construction of the random decoding graph, we know that j3 d , r (i), 
(d, r) € T, i = 1, . . . , Rd -^ are independent and 

Pr{. >',/.,(' i = I A,-0')} = { ~ d-r-r,i-i, (10) 

otherwise. 
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Conditioning on /3<2, r (i), (d, r) S T, i = 1, . . . , by the construction of the random decoding graph, we 

know that (id. r (i), Hd,r{i)), (d,r) S 1 < i < Rd -^ are independent and (id,r(*)j t^d,r(i)) on ly depends on 
l3d,r{i)- Specifically, we have 

Pr{td,r(*) = 0,Md,rW = 0|/3 d , r (z) = 0} = 1 
since the degree of a check node will not change if V is not a neighbor, 

Pr{irf, r (i) = l,ju rf , r (i) = l|/3(i,r(j) = 1} = 
since id,r(i) and Hd,r(i) cannot both be 1, 

Pr{fc d ,r(«) = 1,/Ud,r(0 = °IA«,r(*) = 1} = «d,r (11) 

(cf. ©), and 

Pr{td,r(«) = 0,Hd,r(i) = l\/3d,r(i) = 1 } = (I 2 ) 

By (O, + 1) is a deterministic function of _R(t) and Ld. r (i), ^d,r{i), (d,r) € i = 1, . . . , Rd >^ , where the 
distribution of the latter part is determined by -ff(i) independent of R(t'),t' < t. Thus, the random process {R(t)} 
is a Markov chain. 

Now we calculate the marginal distribution of Ld, r {i) and fJ-d,r(i) f° r all (d,r) and i. When d ^ r, we have 

Pr{t d ,r(i) = 1} 

= ^ 2Pr{^ r (i) = l,A*a, r (0 = 6|^r(*) = o}Pr{^ r (i) = 0^/(0} Pr{A-'(0> 

a,b r' ,i' 

= ^a d , r Pr{^, r (i) = l|A'(0}Pr{^r'(0} (13) 
= 5> d , r -^-^Pr{A r ,(i')} (14) 

d 

= <Xd,r 



K-r 

where cTT~3T > follows from (fTTT i. and (TBI follows from (TTZt with d ^ r; and similarly 

Pr{Aid,r(«) = 1} = 5rf. r ^ 

A — t 

When d = r, we have 

Pr{ ir , r (i) = 1} 

= £ S Pr{^,r(i) = 1, Mr,r(») = = Pr{£r, r (i) = o| (»')} Pr {^' (*')} 

a, 6 r',i' 

= ^a„Pr{/3 r , r (z) = l|A r /(i')}Pr{A'(0} (15) 

r' ,i' 

= 0, (16) 
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where ( [T5T > follows from (fTTt , and ( [Tol follows from a r r = (cf. (O); and 

Pr{pr t r(i) = 1} 

= £2 Pr Kr(») = Mr,rW - l|A-,r(0 = 4 Pr{A-,r(0 - a|A r / («')} Pr{A r / («')} 

a, 6 r',i' 

= ^Pr{/3 r , r (z) = l|i4 P /(0} Pr{^(i')} (17) 
= Pv{A r (i)} + jf— t Pr{^(z')} - ^ Pr{A r (,)} (18) 



r r r 



i?o(0 K-t Ro(t)K-t' 
where (fTTT i follows from ([T2i i and r = 1, and (fl~8T > follows from (fTOb . 

The expectation in (0 is obtained by taking expectation on (9). To verify (©, note that when d = r in (|9), 
t r>r = in the last term. Then we have 

R (t + 1) - R (t) = ( R rA f + !) - ^r.r(i)) 
r 

Rr+l,r(*)/(r+l) flr,r(t)/r 

= £V tr + i )r (i)-^ Mr,r(»)< (19) 

r i=l r z— 1 

Taking expectation on dT9l , we have 



E[ifc(t + 1) - i? (t)] = - E (^gf + (l 



i?o(t) J K -t 



flr+l,r(*) i2o(*) 1 , V- iJr,r(*) 



ra H _ 1>P - r - r -_-l + ^_ 



,(t) X - * 

The expectation in ([8]) is obtained by noting that ^grffl < —ML— since i < elf by assumption. 



C. Sufficient and Necessary Conditions 

We care about when Ro(t) goes to zero for the first time. The evolution of Ro(t) depends on that of Rd,r(t), 
(d,r) G T. To study the trend of Ro(t), the differential equation approach ||291 leads us to consider the system of 
differential equations 

dpdA T ) ( ( \ , - / \ 

3 = (Old+l,rPd+l.r{T) + ad+l,r+lPd+l,r+l{T) 

ar 

-pdAT))-?^—, (d,r)€F, (20) 

— T 

^Po(t) _ E-f^Ti 1 ra r +i,rPr+iA T ) - Po( T ) _ 1 (21) 
dr — r 

with initial values pd,r(0) — Pd,r, (d, r) £ T, and po(0) = X) r /°r,r> where 6* = K/n is the design rate of the BATS 
code. 

We can get some intuition about how the system of differential equations is obtained by replacing Rd, r (t) and 
Ro(t) with npd, r {t/n) and npo(t/n), respectively, in (0 and (©. Defining r = £/ n and letting n — > oo, we obtain 
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the system of differential equations in ( |20l > and (fJTJ. The expectation is ignored because pd,r{ T ) and po(r) are 
deterministic functions. Theorem [5] in Appendix IIVI makes the above intuition rigorous. 

The system of differential equations in ( f20b and d2TT > is solved in Appendix [V] for < r < 9. The solution of 
(ED is 

(M D AI \ 

E«r+l,r E PdJr^WQ+E^ + ^Ml-T/e)), (22) 
r=l ci=r+l r=l / 

where p d is defined by the recursive formula 

Pdl ~ Pd,r, (23) 

(j+1) A (i) (i) 

Pd,r = "d-i.rPrf/r + «<i-i,r+lPd,r+l i ( 24 ) 

and 

la A*) = Q E 1 ^ + ^ _ - ^ a+b ^^ 

J=Q \ 3 / 

is called the regularized incomplete beta function. For 77 G (0, 1), the following theorem shows that if poC 1 ") > 
for r e [0, fj], then the decoding does not stop until t > fjK with high probability, and R^ r (t) and Ro(t) can be 
approximated by npd, r (t/n) and npo(t/n), respectively. 

Theorem 1: Consider a sequence of decoding graphs BATS(i^, n, ^, ft), n = 1,2,... with fixed 9 = K/n, and 
the empirical rank distribution of transfer matrices (tto, ■ ■ ■ , ttm) satisfying 

\7r i -h i \ = 0(n- 1 / 6 ), 0<i<M, (25) 

with probability at least 1 — j(n), where j(n) = o(l). For fj £ (0, 1), 
(i) if Pq(t) > for r G [0,776*], then for sufficiently large K, with probability 1 — 0(n 7 / 24 exp(— n 1 / 8 )) — 7(n), 
the decoding terminates with at least fjK variable nodes decoded, and 

\RdAt) - np d ,r{tjn)\ = 0(ri 5 / 6 ), {d, r) e J 
\R (t)-np (t/n)\=O(n 5 / 6 ) 

uniformly for t G [0, fjK]; 

(if) if Pq(t) < for some r G [0,776*], then for sufficiently large K, with probability 1 — C(7i 7 / 24 exp(— 71 1 / 8 )) — 
7(77), the decoding terminates before fjK variable nodes are decoded. 

Proof: See Appendix |IV| ■ 

IV. Optimization of Degree Distribution 

Theorem Q] gives a sufficient and a necessary condition such that the BP decoding succeeds with high probability. 
These conditions induce an optimization problem that generates a degree distribution that meets our requirement 
in Section llFCl 
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A. Optimization 

We first define some new notations to help the formulation of the optimization of the degree distribution. Let 

h* = OL r -\-\ r hr-\-\ r • (26) 

Since h r+ \ r is a linear function of h (ref. (0), h* is also a linear function of h. Define 

M D M 

n(x;h,y)±J2 h * d^ d l d -rA x )+^ h r,rr^v (27) 

r=l d=r+l r=l 

When the context is clear, we also write Q(x; "J), f2(x; h) or ft(x) to simplify the notation. The expression of po 
in (|22| | can be simplified as follows. 

Lemma 3: Po (t) = (1 - t/6) {Vl(t/0) + 01n(l - t/9)). 
Proof: Define 

h$l = h d , r , 1 < r < M.r < d < D. (28) 
For d > r and < i < d — i — 1, define 

h d£ ~ a d-i,rh { ^ r + a d -i,r+ih^ r+1 . (29) 

By Lemma|4]in Appendix lII-Al hf~ r = h r +i <r . By the definitions in ©, ( 1231 and ( 1281 ). we have = d$> d h dr . 
Since the recursive formulas in (l24l and (|29l are the same, we have for i = 1, . . . , d — r — 1, 

Substitute p d ^ r = d^ d h d d ~ r ~ 1 ' — d^ d h r+ i r in d22"l i. Last, using the definition of ft* in d26l i, we obtain the 
formula in the lemma. ■ 
For fj € (0, 1), we say a rate 6> is fj-achievable by BATS codes if for every e > and every sufficiently large K 
there exists a BATS code with K input packets such that for n < K/8 received batches, the BP decoding recovers 
at least fjK input packets with probability at least 1 — e. Define an optimization problem 

max 8 (30a) 
s.t. fi(x) +01n(l -x)>0, < x < fj, (30b) 
J2*d = l and * d >0, tZ = l,--- ,D. (30c) 

Let 6* be the optimal value in 1301 . 

Proposition 1: When the empirical rank distribution of the transfer matrices converges to h = (ho, . . . , Km) (in 
the sense of (l25ll). for any e > 0, the rate 9 — e is fy-achievable by BATS codes. 

Proof: To show that 6* — e is ^-achievable, by Theorem Q] and Lemma [3] we only need to show that there exists 
a degree distribution such that 

ft(x) + (§ - e) ln(l - x) > 0, 0<x<fj. (31) 



17 



For the degree distribution that achieves 9, we have from ( |30bt 

fi(x; 1 t>)+9 ln(l - x) > 0, < a; < r?. 



Multiplying by we have 



-fi(x; + (0 - e) ln(l - x) > 0, < x < 77. (32) 



Since Q(x; Vf) > for x > 0, (l32l implies that satisfies (l3TT l except possibly for x = 0. Checking the definition 
of fl in d27| i, we have 0(0; = h r>r r^ r . If X^rli h rr r^ r > 0, which implies ^ satisfies ( f3TT >, we are 

done. In the following, we consider X^rli h r r r^ r = 0. 

Let r* be the largest integer r such that h r > 0. We can characterize that h r>r — for r > r* and h r>r > for 
r < r* (cf. (O and (l45l l in Appendix III- Al l. Since ^ r =i h r . r r^ r — 0, we know that X)do* ^ = 0- Define a new 
degree distribution \l>' by = ^d^j 1 for d > r* and = A for d < r*, where A > can be detemiined by 
the constraint ^>' d = 1. The formulation of f2 in d27l l can be rewritten as 

D 

d=l 

for certain functions /d(x), d = 1, . . . , D not related to vf. Using the above formulation, we have for < x < fj, 

n(x,y')-^n(x;*) = j2*' d Mx)+ E e -^^ d fd{x)- e -^ E 

r* 

= E A /^-) 

r* 

>A^dVd (33) 

d=l 

>o, 

where ([33j follows from / d (x) > d/i dld . By ([32]), W satisfies (ED- ■ 
For many cases, we can directly use the degree distribution ^ obtained by solving ( f30b - But when f2(0; = 0, 

by Lemma [31 po(0) — J2 r P r - r = ®> an< ^ nence ^d = 0, d < M (cf. (0]l). Thus, ^ does not guarantee that the 

decoding can start. We can then modify \P as we do in the proof of Proposition Q] by increasing the probability 

masses W^, d < M a little bit to make sure that the decoding can start. 

The maximum degree D in d30ct affects the encoding/decoding complexity. In Section IIII-AI we have assumed 

that D = 0(M). The next theorem shows that it is optimal to choose D < \M /rf\ — 1, where rj = 1 — fj. 
Theorem 2: Using D > \M/rf\ — 1 does not give a better optimal value in (f30b . where r\ = 1 — fj. 

Proof: Consider an integer A such that r\ > . Let be a degree distribution with 2^2d>A > 0- Construct 

a new degree distribution as follows: 

# d = V d , d < A, 

*A = E * d ' 

d>A 
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and 

* d = 0, d>A. 
We can show that Q(x; > Q(x; for all < x < 1 - 77. Write 

n(x;*) - f2(x;*) 

oo M 

= X! *'iX] /l *( AlA - r ' r ^) - d Id-r,r(a:)) 
d=A+l r=l 

For d > A + 1, 

r — 1 M — 1 M ?7 
< < < — — . 

d-r ~ d- M A - A/ + l _ l-77 

So we can apply Lemma [8] in Appendix III-BI to show that, for < x < 1 — 77, 

dl d - r ,r{x) d f _ ry\ 

(d — l)Id~l-r,r(x) d — 1 V rJ 

- d- 1 V M/ 

_ a \ a + i; 
= 1, 

which gives Q(x; 4 r ) > Q(x; 9) for < x < 1 - 77. 
Thus, for certain 6* such that 

n(x;^>) + e\n(l-x) > 0, O^x^I-t?, 

we have 

n(i;*) + 01n(l-a;) > 0, < x < 1 - 77. 

This means that ^> is potentially better than 'J. So we do not need to consider a degree distribution ^ with 
J2d>A > 0- Thus, it is sufficient to take the maximum degree D < min 17> m A = \M/rj] — 1. ■ 
The converse of Proposition [TJ is that "a rate larger than 9 is not 77-achievable". Intuitively, for any e > 0, we 
cannot have a degree distribution such that 

il(x) + (6 + e) ln(l - a;) > 0, < x < fj, 

since otherwise, 8 is not the optimal value in (f30b . Thus, for any degree distribution, f2(x) + (0 + e) ln(l — x) < 
for some x £ [0 , 77] . Taking 6> + e in place of in Lemma [3] for any degree distribution, po(t) < for some 
t G [0, fj{9 + e)]. Hence, we can apply the second part of Theorem [TJ to show that 6* + e is not 77-achievable for any 
degree distribution, since for any degree distribution there exists K such that when the number of input packets 
K > K , with probability approaching 1 the BATS code cannot recover fjK input packets. To rigorously prove this 
converse, however, we need a uniform bound K$ for all degree distributions such that the second part of Theorem [TJ 
holds, which is very tedious if not impossible. Instead of taking this approach, we demonstrate in the rest of the 
section that 9 is close to the capacity of the underlying LOC (cf. Section Hl-BI and Appendix [Jl. 
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B. Upper Bounds on the Achievable Rates 

The first upper bound on the optimal value of (f30b is given by the capacity of LOCs. When the empirical rank 
distribution of the transfer matrices converging to h = (ho, . . . , hj^), the capacity of a LOC, in terms of packets 
per use, is M[h] = ^2 r rh r (see Appendix H). The BP decoding algorithm recovers at least a fraction of fj of all 
the input packets with high probability. So asymptotically BATS codes under BP decoding can recover at least f\B 
fraction of the input packets. Thus, we have f)9 < E[/i]. 

A tighter upper bound can be obtained by analyzing (t30l > directly. Using Lemma [5] in Appendix III-A1 rewrite 

M D Mr 



r=l d=r+l r=l d=l 

M 

= J2KSr(x;V), (34) 



D 



r=l 

where 

SVO;*) = S r (x) = + (35) 

d=r+l d=l 

This form of fl(x; will be used in subsequent proofs. 
Theorem 3: The optimal value of (f30b satisfies 

M 



where h* is defined in 

Note that by Lemma [6] in Appendix III-AI we have '}2 ir rh* < ^„r/j r , i- e -' Theorem [3] gives a strictly better 
upper bound than E[/i]. When q — > oo, ^ r rh* — > J2 r r ^ r ( c ^- ( EH ) m Appendix III-AI ). So when the field size is 
large, these two upper bounds are very close. 

Proof: Using ( f58l ) in Appendix III-B I we have 

D 



pi L> pi r 

/ S r (x)dx= d ^d ld-r,r(x)dx + ^2d^ 

J ° d=r+l J ° d=l 

D r 

= Y r^d + J^d^d 

l=r+l 
D 



l=r+l d=l 
D 



d=l 

r. 



Hence, 

,.1 M M 



(36) 



1 l M M 

/ n(x)dx= y"h*s r {x)dx < y2 r K- 

Jo Jo r=1 r=1 

Since f2(x) is an increasing function and the inequality in ( BObl i holds for x = 1 — rj = fj, 

[ n(x)dx>r 1 n(l-T])>~r]eim]. (37) 
Jl-n 
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Since Q(x) + 9 ln(l - x) > for < x < 1 - r), 

rl-7) 



u 



fl(x) dx — 9{ri In rj + 1 — ?/) 

tt(x)dx + 6 ln(l-x)dx 
io Jo 

> 0. (38) 



Therefore, by (|36]>-(|381, we have 



M i 

> / fl(x)dx 
r=l •' 



1-r, ,1 

/0 

> #(7/ In 7/ + 1 — ry) — 7/0 In r\ 
= §(l-rj). 



C. Lower Bound on the Achievable Rates 

We prove for a special case and demonstrate by simulation for general cases that the optimal value 9 of ( f30b is 
very close to ^ r rh*. 

Theorem 4: The optimal value 9 of ( 130b satisfies 

M 



9 > max r > /i* 
r=l,2,— ,Af f— ' 



Before proving Theorem @] we first explain why Theorem |3]and Theorem |4] together demonstrate that 9 is close 
to the capacity of the underlying LOC for a special case. Consider the case with h K = 1 for some 1 < n < M. 
Theorem g] implies that 9 > nh* K . On the other hand, Theorem |3] says that (1 - rj)0 < Y^ r rh* = nh* K + J2 r <K, r K 
(cf. d56b in Appendix III- All . Note that 77 can be arbitrarily small, and ^ r<R Th* ~^ an d Ki h K when the field 
size goes to infinity (again cf. d56ll). Thus, the upper bound in Theorem [3] and the lower bound in Theorem |4] match 
E[/i] = nh K asymptotically when h K = 1 for some 1 < n < M. 
Proof: Define degree distribution ^f r as 

{0 d < r, 

W=T) d = r + l,-,D-l, (39) 
TJ-! d = D. 

Recall the definition of S r (x; \P) in fl35l >. For M > r' > r, we will show that 

S r ,(x;y r ) +rln(l - x) > 0, < x < 1 - 57. (40) 

By Lemma [9] in Appendix III-B1 

00 _^ 

-rln(l - x) = r 2J ^ _ 1 U-r',r'(x). 
d=r'+l 
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By @5) and @9j, 

D oo ^ 

5^(x;* r ) + rln(l - x) > d ^d l d-r> y {x) - r TItW/W 

<i=r'+l d=r'+l 

> r—^— l D -r',r' (x)-r'S~ y — — - ld-r'.r' (x) 

D — 1 f-^ a — 1 

= rlx)_ r ' >r /(x) - r ^ Id-r',r'(x). 
d=D+l 

We will show that lo-r' ,r'{x) > YldLo+i d~^T ^d-r'.r'(x) for x G [0, 1 — rj\. This is equivalent to show that 

E a 1 i i"""' ^ < 1 for«e[0,l-„]. (41) 



=D+1 



By Lemma [7] in Appendix III-B1 p r ~f~J^\ * s nionotonically increasing, so we only need to prove the above 
inequality for a; = 1 — 77. By Lemma[8]in Appendix III-B1 l dr ',' r '^i^ < (1 — ) d ~ D . Therefore, 

_1 ld-r',r'(x) ^ 1 ^ I<j-r',r'(l - ??) 

d 

00 



d-l\ D _ r , r ,(x) ~ D 2-^ l D -r> r'(l - 5?) 

00 

d=D+l 
M - T) 



Drj 

< 1, 

where the last inequality follows from Z) = \M/rj] — 1. So we have established fiTI ) and hence ( f40b - 
Last, by d34l > and d40l >, we have for < x < 1 — 77, 



n(x;V r ) > h*,S r ,(x;V r ) 

r'>r 

> — ln(l — x)r /i*/, 



r' >r 



n(x; <T) + r ^ f£ ln(l - x) > 0. (42) 



Comparing (l42l and (130bl i. we conclude that > r^ r , >r h*,. The proof is completed by considering all r = 
1,2,.--,M. ■ 

D. Numerical Results 

To see the achievable rates for the general cases, we numerically solve (f30b by taking discrete values of x. Let 
Xi = (l—rf)jlj for some integer N. We relax (I30bb by considering only a; = Xj, i = 1, . . . , JV. Let 6* be the optimal 
value of (130b with this relaxation. Numerical results show that when N is large, 9 becomes small. When N is 
reasonably large, e.g., 100, the optimal value becomes stable. 
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' ' 1 • • • 1 1 1 

0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 

value of A 

Fig. 6. The empirical cumulative distribution function (CDF) of A = (Sril r ' 1 * ~ 0- ~~ v)&)/ ErLl f° r 10000 rank distributions 
uniformly at random chosen. Here q = 2 s and M = 16. 



Set M = 16, q = 2 8 and rj = 0.01. A rank distribution (ho, hi, . . . , Km) is generated as follows: First, for 
i = 0, ...,M, hi is independently and uniformly chosen between zero and one; Second, normalize the rank 
distribution such that hi — 1. We compute 6 for 10000 rank distributions independently generated and compare 
(1 - rj)9 with J2 r r K b y computing A = (J2 r r K ~ (1 _ vW)/J2 r r K- Tne results show that for more than 99% 
of the rank distributions, A is smaller than 0.02, and among all the samples the largest A is 0.0352. Note that for 
these parameters, the difference r ^r — Yl r ^ s °f tbe orc l er 10 -3 , so the upper bound in Theorem[3]is indeed 
very close to the capacity. The empirical cumulative distribution function of A is drawn in Fig. [6] 

V. An Example of BATS Codes 

We apply BATS codes in the network in Fig. Q] The source node s performs BATS code encoding. In each time 
slot, node s sends a packet to node a. Assume transmission is instantaneous and node a receives the packet, if not 
erased, at the same time slot. No matter whether certain packets are received or not, node a transmits at each time 
slot a linear combination of the packets it has received so far. After M time slots, node s switches to another batch 
and node a clears its buffer for the last batch. These operations of the network minimize the transmission delay 
and are asymptotically optimal when M goes to infinity. 

The operation at node a for a batch is given by a random matrix an M x M upper unitriangulaiQ matrix with all 
the upper triangular, off-diagonal entries being independent and uniformly distributed. Let E be an M x M random 
diagonal matrix with independent components. A diagonal component of E is with probability 0.2 and is 1 with 

'A unitriangular matrix has unit entries on the main diagonal. The intermediate operation modelled by a unitriangular matrix becomes 
forwarding when M = 1. 
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TABLE I 

Degree distributions for M = 16, q = 2, 4, 8, 16, respective. Here only the dominant probability masses are listed. The 

SUMMATION OF ALL OTHER PROBABILITY MASSES ARE LESS THAN 0.001. 





q — 2 


q — 4 


q — 8 


~ 1 c 

q = lb 


#13 


0.1500 








#14 


0.3262 


0.2691 


0.1850 


0.1593 


#15 




0.1225 


0.1903 


0.2105 


#20 








0.0020 


#21 


0.0491 


0.2081 


0.2078 


0.2055 


#22 


0.1546 








#28 








0.0019 


#29 




0.0996 


0.1172 


0.1227 


#30 






0.0019 




#31 


0.0310 








#32 


0.0848 








#35 




0.0854 






#36 






0.0155 




#37 






0.0732 


0.0437 


#38 








0.0481 


#46 


0.0986 








#51 




0.0936 






#52 




0.0168 






#53 






0.1073 


0.0015 


#54 








0.1031 


#81 


0.1058 








#91 




0.1040 






#94 






0.1026 




#95 








0.1017 



probability 0.8. The matrix E models the erasures in a link. The transfer matrix of the network is H = E1&E2, 
where E\, $ and E2 are independent, and E\ and E% follow the same distribution of E. 

The rank distribution of H is approximated by the empirical distribution obtained using 10 5 independent samples 
of H. Using the (empirical) rank distribution, a degree distribution is obtained by solving (f30b by taking discrete 
values of x. Table J] lists some degree distributions for different parameters by setting r\ = 0.08. 

BATS codes are rateless, i.e., the coding rate is not fixed. To see the performance of a BATS code, we use the 
average coding rate defined as follows. Consider that the source node encodes K packets using a BATS code, and 
the decoder stops after recovering fjK packets. Here we assume that a precode is used to first encode the original 
message into K packets and any fjK out of these K packets are sufficient to recover the message by decoding the 
precode (cf. Section IH-DI) . Repeat the above simulation J times and let rij be the number of batches used when 
the decoder stops in the jth simulation. The average coding rate of the BATS code is defined as f)KJ/(MJ2j n j)> 
where the rate is normalized by M for the sake of comparison of different value of M, In the following, we will 
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TABLE II 

Average coding rates for M = 16. 



K 


9 = 2 


5 = 4 


5 = 8 


q= 16 


16000 


0.5466 


0.6208 


0.6384 


0.6484 


32000 


0.5589 


0.6377 


0.6563 


0.6636 


64000 


0.5671 


0.6484 


0.6670 


0.6755 


Capacity 


0.6948 


0.7074 


0.7115 


0.7125 



TABLE III 
Average coding rates for M = 32. 



K 


5 = 2 


5 = 4 


5 = 8 


5 = 16 


16000 


0.5826 


0.6145 


0.6203 


0.6248 


32000 


0.6087 


0.6441 


0.6524 


0.6574 


64000 


0.6259 


0.6655 


0.6762 


0.6818 


Capacity 


0.7178 


0.7292 


0.7325 


0.7334 



compare the average coding rates for different parameters, where the average coding rates are maximized over fj. 

The first thing we want to show is that BATS codes outperform fountain codes. We know that when M = 1, 
BATS codes become Raptor codes and the intermediate operation T becomes forwarding. BATS codes can achieve 
rates exceeding 0.64 (see the rates in bold letters in Table HI1 and Hilt, the routing capacity, which serves as an upper 
bound on the maximum achievable rate for Raptor codes. 

The capacity of the LOC formed by the network operation (normalized by M) is K[ik(H)]/M, which takes 

the field size q and the batch size M as parameters. The rows labeled by Capacity in Table [j]] and [Hi] show the 

numerical values of E[rk(JJ)]/M. The simulation results demonstrate that for fixed q and M, when K becomes 

larger, the achievable rate approaches the capacity. For any fixed q, it is not difficult to show that 

Efrk(mi 
1 V n ->• 0.8, M -> oo. 
M 

So when M is large, BATS codes can potentially achieve higher rates. Our simulation results also illustrate this 
trend. We observe that when M becomes larger, capacity values are generally higher when the field sizes are the 
same. 

Another trend we observe is that using large q also increases the rates. A closer look at the simulations further 
reveals that the gain by increasing q becomes smaller when q is large. For example, when M = 16 and K = 32000, 
increasing q from 2 to 4 gains 5.82% in the rate, but increasing q from 4 to 8 gains only 1.29%. 

VI. Concluding Remarks 

Benefiting from network coding and the properties of fountain codes, BATS codes are ideal for transmitting files 
through communication networks. Besides low encoding/decoding complexity, BATS codes can be realized with 
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constant computation and storage complexity at the intermediate nodes. This desirable property makes BATS code 
a suitable candidate for the making of universal network coding based network devices that can potentially replace 
routers. 

In this paper, we mainly discussed the design of BATS codes for one destination node. Given a rank distribution, 
we can design BATS codes that can achieve nearly optimal rates when the empirical distribution of the transfer 
matrix rank converges to rank distribution. For more practical applications, we need to consider BATS codes for 
multiple destination nodes which may have different empirical distributions of the transfer matrix ranks and to 
design BATS codes for unknown empirical distribution of the transfer matrix ranks. The sufficient condition of the 
degree distribution for successful decoding (Theorem [T} can be readily applied for multiple rank distributions. We 
leave the discussion of designing BATS codes for these scenarios to future works. 
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Appendix I 
Linear Operator Channels 

The network operation given by the linear network coding on batches described in Section ITl-B I can be modelled 
by a linear operator channel (LOC) over finite fields. Let Xi be a T x M random matrix representing the zth input, 
let Yi be a T-row random matrix representing the output of the network for the ith use, and let Hi be an M -row 
random matrix representing the network operation. A LOC with input X.- L , i = 1,2,..., and output Yj,, i = 1, 2, . . ., 
is given by 

Y i = X i H i . 

We assume that the instances of Hi are unknown at the transmitter but known at the receiver. The number of 
columns of Hi is arbitrary but finite. 

Let X n = (Xi, . . . ,X n ). Y n and H n are defined similarly. We assume that X n and H n are independent for 
all n. When Hi, i = 1,2,... are independent and follow the same but arbitrary distribution of a random matrix 
H, the LOC is a discrete memoryless channel (DMC) and its capacity is E[rk(i/)] packet per use (see [28 1, |32|). 
Here we show that the capacity of the LOC can be similarly characterized when the transfer matrices change in an 
arbitrary way defined as follows. 

Let 

a \{i ■ 1 < i < n,rk(Hi) = k}\ 
n 

Note that Hk depends on n. Let (ho, . . . , be a probability vector. We assume that the convergence of the matrix 
ranks satisfies 

Pr{|7T fc -C*|<<r(n), fc = 0, ...,M} > 1 -ip(n), (43) 
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where cr(n) — o(l) and ip = o(l). In other words, the empirical rank distribution of H n converges to (kg, . . . , Iim) 
as n goes to infinity. Note that the above assumption on the convergence of {ir/-} is valid when {Hi} are i.i.d. and 
rk(iJi) follows the distribution h. Further, we assume that X n and H n are statistically independent for all n. 

An LOC described above is an arbitrarily varying channel (AVC) with convergent state constraints (refer to 1133] 
Chapter 6] and [34] for more information about AVCs). Here we are concerned about the capacity of the LOC for 
average error probability and randomized codes. For a randomized code, the encoder and the decoder can share a 
common randomness which is independent of the channel states. Randomized codes can potentially achieve higher 
rates than deterministic codes. Based on the results of the capacity of AVCs with state constraints 031 . the capacity 
of AVCs with convergent state constraints can be obtained ll36l . As a special case, the capacity of the LOC with 
constraint (1431 is J2kLi khk packet per use. 

Appendix II 
Some Lemmas 

A. Rank of a Random Matrix 

All matrices discussed here are over the finite field with q elements. Define 

(1 - q-" l )(l - q-' n+1 ) q-m+r-l^ r > Q) 

1 r = 0. 

For 1 < r < to, we can count as follows (see also (37], [38 1) that the number of full rank r x M matrices is 



a in 



«™C = iff" ~l)(q m -q)--- (q m - q^ 1 )- (44) 

A full rank to x r matrix can be obtained by picking its r columns from F m one by one. The first column has 
q m — 1 choices, and the zth column, 1 < i < r, cannot be picked in the subspaces spanned by the first i — 1 
columns, and hence has q m — q 1 ^ 1 choices. So d44l is the number of full rank r x M matrices. We say a matrix 
is totally random if all its components are uniformly i.i.d. By the above counting problem, the probability that an 
r x m totally random matrix is full rank is exactly £™. 

Let Gd be a totally random matrix with d rows and M columns, and let H be an arbitrary random matrix with 
M rows. We show in the following that that for i < M and r < min{d, i}, 

Pr{MG d H) = r\ ±(H) = ^} = gg( gg (< _ r) = (45) 

Let H be any instance of H with rank i. The d rows of G^H are i.i.d. and are uniformly distributed among 

the i-dimensional vector space spanned by the rows of H. Thus, the probability of rk(GdH) = r is equal to the 

probability of a totally random d x i matrix being rank r. The latter is the ratio of the number of d x i matrices 

with rank r, which is q dr (?q ir £/(q rr C) (see E), (39)). Thus, 

n dr C d a ir C l(a rr C) 
Pr{rk(G d H) = r] = q ^ q Z' [q 

q a% 
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Noting that the above probability is only related to the rank of H, the verification of ( |45l l is completed. 
Let hi — PMH){i)- Using (l45l l. we have for r < min{ci, M}, 



A I 



Pi{rk(G d H) = r} = £ Pi{rk(G d H ) = r\rk(H) = i} P±{H) (i) 



%—r 
M 



= Y J C' l h l - (46) 

i—r 

This gives another meaning of hd,r defined in ||3), i.e., 

h d , r = Pr{xk{G d H) = r}. (47) 

Let g be a row of G d and let G' d be the submatrix of G d without the row g. We see that a d ^ r define in © is 
given by the conditional probability 

a d , r = Pv{rk(G' d H) = r\ xk{G d H) = r}. (48) 
Pr{vk(G' d H) = r,xk{G d H) = r} 



Clearly, a r r = 0. When d > r, 

Oidr 



Pv{rk(G d H) = r} 
J2iMMG' d H) = r,rk{G d H) = r\xk{H) = tjh, 
P r {rk(G d H) = r} 



Pr{vk(G d H) = r} 
1 -i~ d+ / V C d ' l h- 
Pr{rk(G d H) = r} 

1 - q- d+r 



-d 



(49) 



(50) 



where (|49l follows from 

Pv{rk{G' d H) = r,rk(G d H) = r\ik(H) = i} 

= Pr{rk(G' d H) = r\rk(H) = i}Pr{ik{G d H) = r\xk{H) = i,rk(G' d H) = r} 
= Q d r -^Pr{gH E (G' d H)\rk(H) = iMG' d H) = r} 

= Ct h Y-\ (51) 

and ( T50b follows from j46t , 

As define in (|28]l and ([29), for 1 < r < M and r < d < D, 

and for < i < d — r — 1 and d > r, 

Lemma 4: h\ — hd~i. r for d > r and 1 < li < d — r. 
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Proof: First we show that h£l = h d - X>T for d > r. Let R = rk(G d H) and R' = fk(G' d H). We have 

h^ r = Ctd, r hd,r + Otd,r+lhd,r+l 

= Pr{R' = r\R = r} Pr{R = r} + Pr{R' = r\R = r + 1} Pr{i? = r + 1} (52) 

= Pr{i?' = r,R = r}+ Pr{R' = r,R = r + l} 

= Py{R' = r,RE {r,r + 1}} 

= Pr{i?' = r} Pr{R e{r,r + 1}\R' = r} 

= Pr{R' = r} (53) 

= /ld-l,r, (54) 

where (l52l and (l54l follow from (l47l > and (148k (l53~t follows because ik(Gd.H) must be either r or r + 1 under the 
condition that rk(G^if) = r. 

(i) 

The general case of the Lemma is proved by induction. Assume h d ' — hd-i. r for d > r and i < d — r. Then 



r = a d -i, r h d ; + a d -i, r +ih d ; 



+i 



' l d-i,r 



h d - 



i — l,rj 



completing the proof. ■ 
By d47l ) and d48l ). ft* defined in ( f26b can be written as 

h* = Pr{ik(G' r+1 H) = r,rk(G r+1 H) = r}. (55) 

Lemma 5: 

M 
k—r 

Proof: By (ED and |55), 

^ = ^Pr{rk(G; +1 F) = rMGr+iH) = r\rk(H) = i} P±{H) (i) = £ -^h*. (56) 



Therefore, 

M M M 

E^ = EE^V- 



k—r k—r i=k 

M 



= E ^ E -^K) 

i=r k=r 
M 

— ^ y hiCr — ftr,r* 
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Lemma 6: Y, r r K < Y, r r ^r- 

Proof: By Lemma [5] Ylh=r = h r ,r = S*=r — SfcLr ^ fc ' wnere the last inequality follows from 
Q < 1. Hence, 

M A/ 

r r—1 k=r 

M M 

r=l fc=r 



B(a,6)=/ 1 ^(l-^-^t= ^f ' 
Jo (a + 6 - 1) 



B. Incomplete Beta Function 

Beta function with integer parameters is used extensively in this work. Related results are summarized here. For 
positive integer a and b, the beta function is defined by 

(o-l)!(6-l)! 

The (regularized) incomplete beta function is defined as 

l a ,b{x) = & ^—f (57) 

B(a,b) 

= °E 1 ( a + b ~ 1 )x j (l-x) a+b - 1 ^. 

]=a \ 3 / 

For more general discussion of beta functions, as well as incomplete beta functions, please refer to [40]. 

Using the above definitions, we can easily show that 

i u 



and 



l a ,b(x)dx = —-, (58) 
a + b 



x a (l — x) b 

l a+ x,b{x) = l a , b (x) V / . (59) 

aB(a, b) 



Lemma 7: [ s monotonically increasing in x. 



Proof: By 



la+l,b{ x ) _ 1 - x ) 



l a ,b{x) aB(a,b)I a , b {x) 
= 1 - 



1 



, R ^v. =0V . + 

in which x 3 {\ — a;) -1- - 7 is monotonically increasing. 



aB(a, b) Y^tl 1 ( 0+ J- 1 )^-(l - x)^~i 
1 
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Lemma 8: When %=±- < where < r\ < 1, V^if <l-^forO<a;<l-77 with equality when 6=1 
and x = 1 —rj. 

Proof: Since ^^t^ i s monotonically increasing in x (cf. Lemma |7}, it is sufficient to show I j +1 ^ b -[j_~^' ) < 
1-f Since a+1 > (6-1) 

wci - 17) = 1 ( a+b ~ ') (i - i^y* 6 - 1 -' 



J, 1 \ 



a + 6 - 1 



<M a )(i-^ 



where the equality holds for 6=1. Thus, 



Ia,i(l-»7) aB(a,6)Ia )6 (l-t;) 

(1 - 7?)V 



< 1 



1_? 



a6 J B(a,6)( Q+ ^ 1 )(l -ry)^ 6 - 1 

7? 

6' 



We will use the following result about the summation of binomial coefficients: 

j + m\ ( n 



J=0 

The above equality can be verified as follows 



f _i ~ m + j + m ~ n ~ l \ ( n ) (61) 
V j + m-n J\jJ 



E(-irf.; n_1 )( 71 .) 

\J+m-nJ\n-jJ 



3=0 

(-ir( _1> ) (62) 



(63) 



where d62i > follows from Vandermonde's identity; (|6H and (l63b use the relation between binomial coefficients with 
negative integers and positive integers. 
Lemma 9: For r > 1, 

E TTj^.rW = -M 1 -a:)) a; e [0, 1). 

d=r+l 

Proof: As a special case, when r = 1, the equality becomes 

00 d— 1 

^|_=-ln(l- a; ), (64) 
which is the Taylor expansion of — ln(l — x) for a; 6 [0, 1). 
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To prove the general case, let us first derive an alternative form of Id- r ,r(%)- For a > 0, 

■*.<«>- e ( V e (-i) , ( a+ '; 1_:7 

= "xt*™ E ( a + 6 ~ 1N ) (-l) m ~ j ^ a + & ~ 1 ~ j 



m—a J—a 
a+b-l 



3 J \ m ~3 



m—a ' j—a 

a+b-l 



m—a 



m \m — a 

m—a 



6 

Using this form for we have 



-r+1 r+1 m—d—r 



^ m ^ d-l\ r I \m- d+r I 

m=l d=max{m,r} + l \ / \ / 



V — A m , (65) 
* — ' m 



where 



For m < r, 



m—l 

d — m ax { m , r } + 1 



E 13!^ r 1 )(m-d + r) ( 1} 



\m — d-\-r 



d=r+l 
d=r+l V 



' d — 2\ [ r — 1 



Errrx^-jf- 1 ^'-' 



m— 1 



' j + r — 1^ ^ m — l 

3=0 
= 1, 



E( j r_- 1 'ju-;-iJ ( - ir "' 



where the last equality follows from (l60l >. Similarly, for m > 



m+r 



d- i\ ( r-1 



A ™- E JZj[ r ){ m -d + r) { 1Y 

d—m+l v 



m—d+r 



m — d + r j 
E L_i) m _rfx r )( 



m+r / j n\ / i 

d — 2\ [ r — 1 



d— m+l 
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The proof is completed by referring to (l64t and (l65i with A m = 1. ■ 

Appendix III 
Layered Decoding Graph 

We have discussed different decoding strategies under the rule that a check node is decodable if and only if 
its rank equals its degree. We say a variable node is decodable if it is connected to a decodable check node. In 
Section Hl-CI a decodable check node is chosen and all its neighbors (variable nodes) are recovered simultaneously, 
while in Section IIII-A1 a decodable variable node is uniformly chosen to be recovered. Here we show that under 
the decoding rule that a check node is decodable if and only if its rank equals to its degree, both strategies stop 
with the same subset of the variable nodes undecoded. 

For a given decoding graph Q, let Q° — Q. Label by L\ all the decodable check nodes in Q a and label by L 2 all 
the variable nodes in Q° connected to the check nodes with label L\. We repeat the above procedure as follows. For 
i = l,2,..., let Q l be the subgraph of Q obtained by removing all the nodes with labels Lj for j < 2i, as well as 
the related edges. (The generator matrices of the check nodes are also updated.) Label by L2i+i all the decodable 
check nodes in Q l and label by L 2 i+ 2 all the variable nodes in Q l connected to the check nodes with label L 2 i+i- 
This procedure stops when Q % has no more decodable check nodes. Let iq be the index where the procedure stops. 
The above labelling procedure is deterministic and generates unique labels for each decodable variable nodes and 
check nodes. 

With the labels, we can generate a layered subgraph Q' of Q. In Q' , layer j, j — 1,2,..., 2«o, contains all the 
check/variable nodes with label Lj. Only the edges connecting two nodes belonging two consecutive layers are 
preserved in Q' . By the assigning rule of the labels, it is clear that a variable node on layer 2i must connect to one 
check node on layer 2% — 1, i = 1, . . . , io, since otherwise, the variable node is not decodable. Further, a check 
node on layer 2i + 1 must connect to some variable nodes on layer 2i, i = 1, . . . , io — L since otherwise, the check 
node should be on layer 2i — 1. 

By the definition of decodability, a decoding strategy must process the variable/check nodes in Q' following an 
order such that a variable/check node is processed after all its lower layer descendant variable/check nodes have 
been processed. The two random decoding strategies we have discussed in Section ITl-CI and Section UlI-AI both can 
process all the nodes in Q' before stopping. 

Appendix IV 
Concentration 

Theorem Q] is proved by applying a general theorem by Wormald [29], fiD . 
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A. Wonnald's Differential Equation 

The statement of the next theorem follows that of ||29l Theorem 5.1] with an extra initial condition. A similar 
version is provided in ll3T] Theorem C.28] with the boundedness condition holding deterministically. 

We say a function f(ui, . . . , Uj) satisfies a Lipschitz condition on V C R J if there exists a constant Cl such 
that 

|/(iti,-- - ,Uj) - /(«!,-•• ,Vj)\ < C L max \u t - v t \ 

i<i<j 

for all (ui, ■ ■ ■ , Uj) and • • • , Vj) in T>. We call Cx the Lipschitz constant for /. Note that maxi<Kj Wi — Vi\ 
is the distance between (ui, ■ ■ ■ ,Uj) and (vi, ■ ■ ■ ,Vj) in the ^°°-norm. 

Theorem 5: Let Qq,Q\,.. . be a random process with a positive integer parameter n, and let (Yi(t))fL_ be a 
random vector determined by Go,. . . ,Gt- F° r some constant Co and all I, \Yi(t)\ < Con for t > and all n. Let 
D be some bounded connected open set containing the closure of 

{(0, Zl) ...,z L ): 3n, Pr{^(0) = z ; n, 1 < / < L} ^ 0}. 

Define the stopping time T-p to be the minimum t such that (t/n, Yi(t)/n, . . . , Yt(t)/n) ^ 2?. Assume the following 
conditions hold. 

(i) (Boundedness) For some functions f3 — (3(n) > 1 and 7 = 7(n), the probability that 

max|H(t + l)-li(t)| </3, 

is at least 1 — 7 for f < Tx>. 

(ii) (Trend) For some function Ai = Ai(n) = o(l), if t < Tp, 

E[^(t+l)-^(i)|gi,...,ft] = /i ^.(^). J +0(Ai), 

for 1 < / < L. 

(iii) (Lipschitz) Each function // satisfies a Lipschitz condition on "D fl {(t, z%, . . . , Zl), t > 0} with the same 
Lipschitz constant Cl for each I. 

(iv) (Initial condition) For some point (0, z®, . . . , zf) G T>, 

\Yi(0)/n-zf\ < a = o(l),0 <l < L. 

Then the following are true. 

(a) For (0, (zi)f =l ) £ V, the system of differential equations 

^- = MT,(z l ,(T))f! =1 ), l = l,...,L, 

has a unique solution in T> for zi : K — > R passing through zj(0) = 5;, / = 1, . . . , L, and this solution extends 
to points arbitrarily close to the boundary of V. 

(b) Let A > max{(7, Ai + Cquj} with A = o(l). There exists a sufficiently large constant C\ such that when n 



is sufficiently large, with probability 1 — 0{w) + f exp(— ^-)) 



|H(*)-n«i(t/n)| = 0(An) (66) 
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uniformly for < t < fn and for each /, where ii — and f — f (n) is the supremum of those r to which 
the solution of the system of differential equations in (a) can be extended before reaching within I °° -distance 
C\\ of the boundary of T>. 

Proof: The proof follows exactly the proof of 1129] Theorem 5.1] except for the place where we should handle 
the initial condition (iv). We only need to modify the definition of Bj (below (5.9) in f29lO in the original proof to 

B j = (nA + w)Nl + ^j -lj+B (l+^ 

where Bq = nX. The induction in the original proof now begins by the fact that \zi(0) — Y/(0)/n| < a < 0(A). 
The other part of the proof stays the same as that of [29. Theorem 5.1]. ■ 

B. Proof of Theorem [7] 

We first prove two technical lemmas. For BATS (if, n, 9, h), the degrees of the variable nodes are not independent 
but follow the same distribution. The following lemma shows that the degree of a variable node is not likely to be 
much larger than its expectation. 

Lemma 10: Let V be the degree of a variable node of BATS(X, n, h). For any a > 0, 



Pr{V> (l + a)E[tf]/0} < 



, E[*]/e 



(1 +£*)(!+<*) 



where 8 = K/n. 

Proof: Fix a variable node. Let AQ be the indicator random variable of the ith check node being the neighbor 
of the specific variable node. Then V = J2i X i- We have E M = E 2 E M = E 4 Ed Tt^d = f = 
Since AQ, i = 1, . . . , n, are mutually independent, the lemma is proved by applying the Chernoff bound. ■ 

The following lemma verifies the boundedness condition of Theorem 

Lemma 11: When B/D > E[^>]/0, the probability that 



is at least 



max \R L {t + l)-R t (t)\<8, 
ie.Fu{o} 



i ... H „ oxl) j -L^/d) - nnme) - 1) - ffl 



Proof: Let V be the degree of the variable node to be removed at the beginning of time t + 1. By (O, we 
have for (d, r) € T, 

\R dir (t + l)-R dir (t)\<DV, 

and by ( fT9l , we have 

\Ro(t + l) -Ro(t)\ < DV. 
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Hence when (3/D > E[*]/0, 



Pr<^ max \RAt + 1) - R L (t)\ < ft 
[tejru{o} 

> Vx{VD < (3} 

> Pr{the degrees of all variable nodes at time zero < (3/D} 

> 1 - 6n exp -^(ln(/3/£>) -ln(E[#]/0) - 1) - 1 



where the last inequality follows from Lemma [10] and the union bound. ■ 
Proof of Theorem\I\ We consider in the proof only the instances of BATS (if, n, ^, h) satisfying 

= 0(n-^ 6 ), (d,r)eF. (67) 



Rd,r 

Pd,7 



n 

By Lemma[T]this will decrease the probability bounds we will obtained by at most j(n) + 2MD exp(— 2n 2 / 3 ). 

Define the stopping time To as the first time t such that Ro(t) — 0. By defining proper functions fd ir , fo we 
can rewrite ^} and © as 

E[B <r {t + 1) - RdAt)\R(t)} 

\ n V n J V n ) (d > : r>)^) 
E[i? (i + 1) -Ra(t)\R(t)] 

\ n V « / V n ) {d , tr , )eT ) \n 
for i < T Q . For t£ JU {0}, define random variable R L as R L (0) = i? t (0) and for t > 0, 

+ 1) t<T 

R,. 



Note that To is also the first time that Ro(t) becomes zeros. 

We apply Theorem|5]on (Ro(t), (-Rd,r(i))(d,r)6J r ) m place of (Yi(t))i =1 . The region T> is defined as 

V = (-7j, (1 - 77/2)0) x (-M, M + V )x (-77, d) ^ . 

So 1) £/n is in the interval (-r), (1 - r}/2)9)\ 2) R (t)/n is in the interval (— M,M + 77); and 3) R d , r (t)/n, 
{d,r) G J 7 , is in the interval (— 77, d). As required, O is a bounded connected open set and containing all the 
possible initial state (0, Ro(0)/n, {R d ,r{Q) /n)( d _ r ) e jr). 

The conditions of Theorem [5] are ready to be verified. When t > To, the change \R L (t + l) — R L (t)\ for t S J r U{0} 
is deterministic and upper bounded. When t < T), by Lemma QT| with /3 = n 1 / 8 , the boundedness condition (i) 
holds with 

7 = ncxp ^-n 1 / 8 (ci :3 Inn - 01,1) - Ci :2 ) , 
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where cxi, Ci,2, and are only related to E[W] and 0. The trend condition (ii) is satisfied with Ai = 0{l/n). 
By definition, it can be verified that f L , te JU {0} satisfy the Lipschitz condition (iii). The initial condition (iv) 
holds with a = 0{n-^ 6 ). 

Wormald's method leads us to consider the system of differential equations 

dpdA T ) 



dT 

dpoM 



fd,r(T,po(T),(p d ,y(T)) id ,y )e jr), (d, ?') E T 



fo{r, po(r), (Pd',r'( T ))(d' 



dT 

with the initial condition pd.r{0) = pd,r> (d, r) E T, and po(0) = J2 r Pr,r- The conclusion (a) of Theorem [5] shows 
the existence and uniqueness of the solution of the above system of differential equations. We solve the system of 
differential equations explicitly in Appendix [V] 

Let A = C(n _1 / 6 ). By the conclusion (b) of Theorem|5] we know that for a sufficiently large constant C\, with 
probability 1 - C(n 7 + f exp(-^)), 

\RdAt) - np d , r (t/n)\ = 0(n 5 / 6 ), (d,r) E T, 
\R (t)-np a (t/n)\=O(n 5 / 6 ) 

uniformly for < t < fn, where f is defined in Theorem [5] Increase n if necessary so that ^exp(— ^-) = 
n 7 / 24 exp(— n -1 / 8 ) > rvy and C\\ < %6, which implies f > (1 — rj)6. So there exists constants cq and c' such 
that the event 

E = {\Ro(t)/n - p (t/n)\ < co^ 1 / 6 , < t < (1 - rf)K] 

holds with probability at least 1 — c n 7 / 24 exp(— n^ 1 / 8 ). 

Now we consider the two cases in the theorem to prove, (i) If po(r) > for r E [0, (1 — rj)ff\, then there exists 
e > such that po(r) > e for r E [0, (1 — r])9]. Increase n if necessary so that con" 1 / 6 < e. Then, we have 

Pr{T > (1 - ?7)i^} = Pr{R(t) > 0, < t < (1 - ry)^} 

> Pr{^ } (68) 

> l-c> 7 / 24 cxp(-n- 1 / 8 ), 

where d68) follows that under the condition Eq, for all £ E [0, (1 - 77) J\T], .Ro(i) /ra > Po(t/n) - con -1 / 6 > 0. Since 
i? t = i? t , teJU {0}, when t < To, the first part of the theorem is proved. 

(ii) Consider po( T o) < for tq E [0, (1 — 77)$]. There exists e > such that pa(r) < —e for all t E [tq — e, To + 
e] n [0, (1 — rj)9]. Increase n if necessary so that cqu^ 1 ^ 6 < e and ne > 1. Then, we have 

Pr{T < (1 - r])K} 

= Pr{R (t) < 0, for some t E [0, (1 - T])K]} 

> Pr{Eo} (69) 

> l- Co n 7 / 24 exp(-n- 1 / 8 ), 
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where d69l l can be shown as follows. Since ne > 1, there exists to such that to/n € [tq — e, tq + e] n [0, (1 — 77)$]. 
Hence, under the condition E , R (to)/n < c^rT 1 ^ + pa(t /n) < 0. 

The proof of the theorem is completed by subtracting the probability that d67b does not hold. ■ 

Appendix V 
Solve the system of differential equations 

We solve the following system of differential equations given in (f20b and (fJTJ, which is repeated as follows: 

dpd,r(r) 



dr 

d 



(otd+l,rPd+l,r(T~) + ad+l,r+lPd+l,r+l{~r) - pd,r( T )) X 

Kr < M, r <d< D 



— T 



dpo(r) = J2 r =i m r+ i, r /3 r+ i, r (r) - p (r) _ 
dr 6-t 



with pd, r (0) = Pd,r and p (0) = J2r=i Phi- 
Let y d A T ) = (1 - T/6)- d p d , r (T). We have 



dr 

We see that Ud,r(0) = Pd,r- Define 



dyd,r(r) d _ 

= -n(.Oid+l,ryd+l,r(T) + ad+l,r+lUd+l,r+l(T)). 



We can verify that 



Thus 



Pd.r = Pd,r 
Pd,r = a d-i,rPd' r + OLd-i,r+lPd' r+ v 



y*r(r)=f:fj; 1 1 )(r/^-^ 



Pd.rW = (1 - r/0) d £ _ J) (r/ey- d ^ . (70) 



Using the general solution of linear differential equations, we obtain that 



po(r) = (1 - t/B) 
= (l-r/9) 



( f ^Ura^P r+1 At) {1 _ t/ff) _ ldt + 0H1 _ t/()) + ^ ,\ 

\ J ° r>l J 

( V raH-i,r f ^4^(1 - t/e)-'dt + 9 ln(l - t/6) + V p r A . (71) 

\r=l J ° r>l / 
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The integral in d7TT > can be further calculated as follows 



- 1 



(9-t)(i-t/e) 



[ T J2 ' 1 (■' l \i-tiey-\t/ey-^ d ± 

Jo 3=r+l \ r J v 



D 



= E /'/;,' : (V)/ (i-ty-H^dt 



j=r+l 
D 



t/6 



( r J (j~Tj\ V-rAT/V) 



]=r+l 

D 



(72) 



j=r+l 

where (|72l is obtained by substituting p r+ i )r (t) in (|70l ), and d72l is obtained by the definition of incomplete beta 
function (cf. (1571)). 
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